497 research outputs found

    Reinforcement learning applied to the real world : uncertainty, sample efficiency, and multi-agent coordination

    Full text link
    L'immense potentiel des approches d'apprentissage par renforcement profond (ARP) pour la conception d'agents autonomes a été démontré à plusieurs reprises au cours de la dernière décennie. Son application à des agents physiques, tels que des robots ou des réseaux électriques automatisés, est cependant confrontée à plusieurs défis. Parmi eux, l'inefficacité de leur échantillonnage, combinée au coût et au risque d'acquérir de l'expérience dans le monde réel, peut décourager tout projet d'entraînement d'agents incarnés. Dans cette thèse, je me concentre sur l'application de l'ARP sur des agents physiques. Je propose d'abord un cadre probabiliste pour améliorer l'efficacité de l'échantillonnage dans l'ARP. Dans un premier article, je présente la pondération BIV (batch inverse-variance), une fonction de perte tenant compte de la variance du bruit des étiquettes dans la régression bruitée hétéroscédastique. La pondération BIV est un élément clé du deuxième article, où elle est combinée avec des méthodes de pointe de prédiction de l'incertitude pour les réseaux neuronaux profonds dans un pipeline bayésien pour les algorithmes d'ARP avec différences temporelles. Cette approche, nommée apprentissage par renforcement à variance inverse (IV-RL), conduit à un entraînement nettement plus rapide ainsi qu'à de meilleures performances dans les tâches de contrôle. Dans le troisième article, l'apprentissage par renforcement multi-agent (MARL) est appliqué au problème de la réponse rapide à la demande, une approche prometteuse pour gérer l'introduction de sources d'énergie renouvelables intermittentes dans les réseaux électriques. En contrôlant la coordination de plusieurs climatiseurs, les agents MARL obtiennent des performances nettement supérieures à celles des approches basées sur des règles. Ces résultats soulignent le rôle potentiel que les agents physiques entraînés par MARL pourraient jouer dans la transition énergétique et la lutte contre le réchauffement climatique.The immense potential of deep reinforcement learning (DRL) approaches to build autonomous agents has been proven repeatedly in the last decade. Its application to embodied agents, such as robots or automated power systems, is however facing several challenges. Among them, their sample inefficiency, combined to the cost and the risk of gathering experience in the real world, can deter any idea of training embodied agents. In this thesis, I focus on the application of DRL on embodied agents. I first propose a probabilistic framework to improve sample efficiency in DRL. In the first article, I present batch inverse-variance (BIV) weighting, a loss function accounting for label noise variance in heteroscedastic noisy regression. BIV is a key element of the second article, where it is combined with state-of-the-art uncertainty prediction methods for deep neural networks in a Bayesian pipeline for temporal differences DRL algorithms. This approach, named inverse-variance reinforcement learning (IV-RL), leads to significantly faster training as well as better performance in control tasks. In the third article, multi-agent reinforcement learning (MARL) is applied to the problem of fast-timescale demand response, a promising approach to the manage the introduction of intermittent renewable energy sources in power-grids. As MARL agents control the coordination of multiple air conditioners, they achieve significantly better performance than rule-based approaches. These results underline to the potential role that DRL trained embodied agents could take in the energetic transition and the fight against global warming

    Nonparametric estimation of the fragmentation kernel based on a PDE stationary distribution approximation

    Full text link
    We consider a stochastic individual-based model in continuous time to describe a size-structured population for cell divisions. This model is motivated by the detection of cellular aging in biology. We address here the problem of nonparametric estimation of the kernel ruling the divisions based on the eigenvalue problem related to the asymptotic behavior in large population. This inverse problem involves a multiplicative deconvolution operator. Using Fourier technics we derive a nonparametric estimator whose consistency is studied. The main difficulty comes from the non-standard equations connecting the Fourier transforms of the kernel and the parameters of the model. A numerical study is carried out and we pay special attention to the derivation of bandwidths by using resampling

    Secrecy rate optimization for secure multicast communications

    Get PDF
    Recently, physical layer security has been recognized as a new design paradigm to provide security in wireless networks. In contrast to the existing conventional cryptographic methods, physical layer security exploits the dynamics of fading channels to enhance security of wireless communications. This paper studies optimization frameworks for a multicasting network in which a transmitter broadcasts the same information to a group of legitimate users in the presence of multiple eavesdroppers. In particular, power minimization and secrecy rate maximization problems are investigated for a multicasting secrecy network. First, the power minimization problem is solved for different numbers of legitimate users and eavesdroppers. Next, the secrecy rate maximization problem is investigated with the help of private jammers to improve the achievable secrecy rates through a game theoretic approach. These jammers charge the transmitter for their jamming services based on the amount of interference caused to the eavesdroppers. For a fixed interference price scenario, a closed-form solution for the optimal interference requirement to maximize the revenue of the transmitter is derived. This rate maximization problem for a nonfixed interference price scenario is formulated as a Stackelberg game in which the jammers and transmitter are the leaders and follower, respectively. For the proposed game, a Stackelberg equilibrium is derived to maximize the revenues of both the transmitter and the private jammers. To support the derived theoretical results, simulation results are provided with different numbers of legitimate users and eavesdroppers. In addition, these results show that physical layer security based jamming schemes could be incorporated in emerging and future wireless networks to enhance the quality of secure communications

    Secure multicast communications with private jammers

    Get PDF
    This paper investigates secrecy rate optimization for a multicasting network, in which a transmitter broadcasts the same information to multiple legitimate users in the presence of multiple eavesdroppers. In order to improve the achievable secrecy rates, private jammers are employed to generate interference to confuse the eavesdroppers. These private jammers charge the legitimate transmitter for their jamming services based on the amount of interference received at the eavesdroppers. Therefore, this secrecy rate maximization problem is formulated as a Stackelberg game, in which the private jammers and the transmitter are the leaders and the follower of the game, respectively. A fixed interference price scenario is considered first, in which a closed-form solution is derived for the optimal amount of interference generated by the jammers to maximize the revenue of the legitimate transmitter. Based on this solution, the Stackelberg equilibrium of the proposed game, at which both legitimate transmitter and the private jammers achieve their maximum revenues, is then derived. Simulation results are also provided to validate these theoretical derivations

    Human-Robot Interaction

    Get PDF
    Human-robot interaction (HRI) is a discipline investigating the factors affecting the interactions between humans and robots. It is important to evaluate how the design of interfaces affect the human's ability to perform tasks effectively and efficiently when working with a robot. By understanding the effects of interface design on human performance, workload, and situation awareness, interfaces can be developed to appropriately support the human in performing tasks with minimal errors and with appropriate interaction time and effort. Thus, the results of research on human-robot interfaces have direct implications for the design of robotic systems. For efficient and effective remote navigation of a rover, a human operator needs to be aware of the robot's environment. However, during teleoperation, operators may get information about the environment only through a robot's front-mounted camera causing a keyhole effect. The keyhole effect reduces situation awareness which may manifest in navigation issues such as higher number of collisions, missing critical aspects of the environment, or reduced speed. One way to compensate for the keyhole effect and the ambiguities operators experience when they teleoperate a robot is adding multiple cameras and including the robot chassis in the camera view. Augmented reality, such as overlays, can also enhance the way a person sees objects in the environment or in camera views by making them more visible. Scenes can be augmented with integrated telemetry, procedures, or map information. Furthermore, the addition of an exocentric (i.e., third-person) field of view from a camera placed in the robot's environment may provide operators with the additional information needed to gain spatial awareness of the robot. Two research studies investigated possible mitigation approaches to address the keyhole effect: 1) combining the inclusion of the robot chassis in the camera view with augmented reality overlays, and 2) modifying the camera frame of reference. The first study investigated the effects of inclusion and exclusion of the robot chassis along with superimposing a simple arrow overlay onto the video feed of operator task performance during teleoperation of a mobile robot in a driving task. In this study, the front half of the robot chassis was made visible through the use of three cameras, two side-facing and one forward-facing. The purpose of the second study was to compare operator performance when teleoperating a robot from an egocentric-only and combined (egocentric plus exocentric camera) view. Camera view parameters that are found to be beneficial in these laboratory experiments can be implemented on NASA rovers and tested in a real-world driving and navigation scenario on-site at the Johnson Space Center

    From frequency dispersion to ohmic impedance: A new insight on the high-frequency impedance analysis of electrochemical systems

    Get PDF
    International audienceThe increasing use of impedance for the characterization of an electrified interface is accompanied by the development of accurate models to analyze the results. In the present work, the concept of ohmic impedance is revisited using both numerical simulations and experimental results. The Havriliak-Negami equation is shown to provide a good representation of the high-frequency dispersion or complex ohmic impedance associated with the disk electrode geometry. An excellent fit to simulated complex ohmic impedance was found for both capacitive electrodes and for electrodes characterized by constant-phase-element behavior. The use of the Havriliak-Negami equation to account for the complex ohmic impedance was shown to extend the useful frequency range for regression of physical models to the impedance response for three experimental systems: a gold electrode in a 0.1 M sodium sulfate solution, an aluminum electrode in a 0.01 M sodium sulfate solution, and pure iron in a 0.5 M sulfuric acid solution

    Statistical deconvolution of the free Fokker-Planck equation at fixed time

    Full text link
    We are interested in reconstructing the initial condition of a non-linear partial differential equation (PDE), namely the Fokker-Planck equation, from the observation of a Dyson Brownian motion at a given time t>0t>0. The Fokker-Planck equation describes the evolution of electrostatic repulsive particle systems, and can be seen as the large particle limit of correctly renormalized Dyson Brownian motions. The solution of the Fokker-Planck equation can be written as the free convolution of the initial condition and the semi-circular distribution. We propose a nonparametric estimator for the initial condition obtained by performing the free deconvolution via the subordination functions method. This statistical estimator is original as it involves the resolution of a fixed point equation, and a classical deconvolution by a Cauchy distribution. This is due to the fact that, in free probability, the analogue of the Fourier transform is the R-transform, related to the Cauchy transform. In past literature, there has been a focus on the estimation of the initial conditions of linear PDEs such as the heat equation, but to the best of our knowledge, this is the first time that the problem is tackled for a non-linear PDE. The convergence of the estimator is proved and the integrated mean square error is computed, providing rates of convergence similar to the ones known for non-parametric deconvolution methods. Finally, a simulation study illustrates the good performances of our estimator